Energy Load Forecast in the Czech Republic: An Artificial Neural Networks Approach

Author

Camilo Gómez and Silvester van Koten

Published

July 11, 2023

Code
# Let's import packages
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import tensorflow as tf
from prettytable import PrettyTable
import plotly.express as px
import plotly
import plotly.io as pio
pio.templates.default = "seaborn"
pio.renderers.default = "plotly_mimetype+notebook_connected"
import os

import keras
from keras import models
from keras import layers
from keras import regularizers
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_absolute_percentage_error

sns.set()  # make plots nicer

1 Introduction

Accurate electricity load forecasting has become a crucial factor in the efficient operation of power systems worldwide. It helps utilities, grid operators, policymakers, and market participants to make informed decisions regarding generation planning, resource allocation, pricing, and demand-side management. Traditionally, electricity load forecasting is built using stochastic time series models. For instance, the European Network of Transmission System Operators (entso-e) uses the Mid-Term Adequacy (MAF) method, which assumes a polynomial relation between temperature and energy load (citation). However, recent works have implemented artificial intelligence methods to predict electricity load and prices reaching a lower mean absolute percentage error than linear methods (Behm, Nolting, and Praktiknjo 2020). Behm, Nolting, and Praktiknjo (2020) propose an artificial neural network (hereafter, ANN) to create synthetic electricity load profiles for Germany, Sweden, Spain, and France in the 2025 scenario using meteorological and calendric data. In this paper, we adapt Behm, Nolting, and Praktiknjo (2020)’s method to the Czech Republic market to answer three questions:

  1. What would have been the energy load profile in the Czech Republic in a scenario in the absence of COVID-19 in 2020 and 2021?

  2. What would have been the energy load profile in the Czech Republic in a scenario in the absence of the Russo-Ukrainian War in 2022 and 2023?

  3. What will be the forecast for the energy profile in 2025 in the Czech Republic?

The past three years have presented significant challenges for the energy industry in Europe. First, the COVID-19 pandemic represented perhaps the last century’s most significant global economic shock. It notably affected production, distribution, and consumption in economic activity. Given that the energy industry runs through this entire chain, the outcome in the absence of the COVID-19 pandemic is worth considering. Second, the Russian-Ukrainian War represents a regional shock directly affecting the European energy market, given Russia’s role in energy production. Studying how the War has affected the energy burden in the Czech Republic is relevant because it provides insights into how public policy might react in this and similar scenarios. Finally, it is interesting to look at how these two significant events have affected the medium-term prediction of the energy profile and compare it to the scenario without them.

2 Artificial Neural Network Approach

We follow the method developed by Behm, Nolting, and Praktiknjo (2020)

3 Data

Code
# data import (daily)
data_complete = pd.read_csv('final_db.csv')

# data import (hourly)
data_complete2 = pd.read_csv('final_db2.csv')
data_complete2 = data_complete2.drop('Unnamed: 0', axis=1)
#Only consider before 2023 and after 2014
data_complete2 = data_complete2[data_complete2['Date'] < '2023-01-01']

load_mean = data_complete2['load'].mean()
load_sd = data_complete2['load'].std()

#print(data_complete2)

We collected hourly electricity load data load from the ENTSO-E Transparency Platform (2022) and meteorological data from the Climate Change Service (C3S) (2023) for the Czech Republic for 2015-2022. Figure 1 shows the energy load series in megawatts (MW) for this period. A noteworthy behavior in this series is seasonality, with valleys in summer and peaks in winter (with exceptions on Christmas Eve and New Year’s Eve). The average and the standard deviation are 7458 MW and 1283 MW for the period.

Code
load_plot = px.line(data_complete2, x='Date', y='load')
load_plot.update_xaxes(rangeslider_visible=True)
load_plot.update_yaxes(title='Energy load in megawatts (MW)')

load_plot.show()

The Meteorological data includes the eastward and northward wind speed at 10 meters above the surface, dew point temperature at 2 meters above the surface, temperature at 2 meters above the surface, surface pressure, and total precipitation. Table 1 shows the descriptive statistics for these variables.

Code
climate_statistics = data_complete2[['u10', 'v10', 'd2m', 't2m', 'sp', 'tp']].agg(['mean', 'std', 'min', 'max'])

# Create a new DataFrame for the statistics
statistics_df = pd.DataFrame({'Variable': climate_statistics.columns})

# Add average, standard deviation, minimum, and maximum columns
statistics_df['Average'] = climate_statistics.loc['mean'].values
statistics_df['Standard Deviation'] = climate_statistics.loc['std'].values
statistics_df['Minimum'] = climate_statistics.loc['min'].values
statistics_df['Maximum'] = climate_statistics.loc['max'].values

# Display the table
#print(statistics_df.round(2))
Table 1: Descriptive statistics for meteorological variables in the Czech Republic, 2015-2022
Variable Unit Average Standard Deviation Minimum Maximum
Eastward wind speed at 10 meters above the surface Meters per second (m/s) 0.90 2.54 -7.65 12.54
Northward wind speed at 10 meters above the surface Meters per second (m/s) 0.17 1.99 -8.00 8.22
Dew point temperature at 2 meters above the surface Degrees Celsius (°C) 5.04 6.51 -21.63 20.14
Temperature at 2 meters above the surface Degrees Celsius (°C) 10.45 8.47 -20.66 37.16
Surface pressure Pascals (Pa) 98320.22 830.02 94125.44 101187.94
Total precipitation Meters (m) 0.00 0.00 -0.00 0.01

4 Results

In this section we present the results of our ANN model. We use the dataset from 2015 to 2018 as training set with 20 \% as test set. We use 2019 as validation set to check overfitting in the model. In Section 6 we discuss all the models specifications we considered comparing their prediction metrics. We chose the model specification that provides the lower mean absolute percentage error, i.e. 2.54 \%. The model has 4 layers, 120 epochs, and a batch size of 32. We use RMSprop optimizer Hinton (2012) with default learning rate of 0.001.

Code
# select rows with date earlier than 12/12/2018
database2 = data_complete2[data_complete2['Date'] <= '2018-12-31']
#Drop missing values (?) potential issue! Check
data2 = database2.dropna().copy()

# select rows with date later than 12/12/2018
data2_2019_2022 = data_complete2[data_complete2['Date'] > '2018-12-31']
#Drop missing values
data2_2019_2022 = data2_2019_2022.dropna().copy()

# train test split
# data_X is a dataframe with all columns of 'data' dropping 'Date' and 'load' columns.
# data_y is a series with only column 'load' of 'data'.
data2_X, data2_y = data2.drop(["Date", "load", "price", "load_forecast"], axis = 1), data2.load
#drop missing
data2_X.dropna(inplace=True)
data2_y.dropna(inplace=True)

#'train_test_split()' function is to split the predictor variables and the target variable into training and testing sets
# 20% of the data will be used for testing
# 42 is the random seed

data2_train_X, data2_test_X, data2_train_y, data2_test_y = train_test_split(
    data2_X, data2_y, test_size = 0.2, random_state = 42)

# normalize the data2
scaler2 = MinMaxScaler() #define scaling method
scaler2.fit(data2_train_X) #intialize scaler
data2_train_X = scaler2.transform(data2_train_X)
data2_test_X = scaler2.transform(data2_test_X)
Code
# create a NN
# We create a model architecture using Keras Sequencial API. It has four layers. The first one is densely connected with 64 units, a ReLU activation function, and input shape of data_train. The 2-4 are also densely connected with 64 units. The final layer is single unit layer with no activation function. The output is a continuous value.
# Compile method is used to specify the optimizer, loss function, and evaluation metrics for the model. We're using mean squared error for this purpose.

def build_model2():
    model = models.Sequential()
    model.add(layers.Dense(64, activation = 'relu', input_shape = (data2_train_X.shape[1],)))
    model.add(layers.Dense(64, activation = 'relu'))
    model.add(layers.Dense(64, activation = 'relu'))
    model.add(layers.Dense(1))
    model.compile(optimizer = 'rmsprop',
                  loss = 'mse',
                  metrics = ['mean_absolute_error','mean_squared_error','mean_absolute_percentage_error'])
    return model
Code
model_hourly6 = build_model2()

4.1 The impact of COVID-19 on energy load profile in the Czech Republic

Code
test2_2020_full = data2_2019_2022[data2_2019_2022['Year'] == 2020]

#Excluding date, price, load predicted columns
test2_2020_all = test2_2020_full.iloc[:, 4:]
#Only date column
test2_date_2020 = test2_2020_full.iloc[:, 0]

test2_2020 = test2_2020_all.copy()
test2_2020 = scaler2.transform(test2_2020)

pred2_2020 = model_hourly6.predict(test2_2020)
275/275 [==============================] - 1s 2ms/step

In this section we answer what would have been the energy load profile in the Czech Republic in a scenario in the absence of the COVID-19 breakdown in 2020 and its latter effect in 2021?

To this purpose we use the ANN model described in the previous section using the actual input set for 2020. Given that our model was built using historical data prior the COVID-19, this exercise allows us to get a counterfactual series on how the energy load would be without the COVID-19.

The following figure shows both the predicted and actual series of the energy load in megawatts (MW) for the Czech Republic in 2020. The first result we want to highlight is that both series behave similarly until the beginning of the state of emergency in the Czech Republic on March 12, 2020. However, after mid-March, the predicted series shows higher values than the actual ones. Naturally, this behavior is due to the lockdown measures that affected all economic activity. According to the Czech National Bank, these measures had a negative impact on all industries, with the largest effect on services (especially tourism and restaurants), transport, and automobile production (Róbert et al. 2020). The second result is that, following the lifting of most restrictions in June, the two series began to gradually converge. Overall, we note that the measures to control the spread of COVID-19 in the Czech Republic significantly reduced the energy load in the March-June period of 2020 but had little effect in subsequent months.

Code
# Reshape pred2_2019 to match the shape of test2_date_2019
pred2_2020 = np.reshape(pred2_2020, (-1,))

# Concatenate actual and predicted values into a single DataFrame
df = pd.DataFrame({'Date': test2_date_2020,
                   'Actual': test2_2020_full['load'],
                   'Predicted': pred2_2020})

# Plot the actual and predicted values
pred_plot_2020 = px.line(df, x='Date', y=['Actual', 'Predicted'], labels={'value': 'Load, MW'})

# Set plot title and axis labels
pred_plot_2020.update_layout(xaxis_title='Date', yaxis_title='Energy load in megawatts (MW)')

pred_plot_2020.update_xaxes(rangeslider_visible=True)

# Show the plot
pred_plot_2020.show()

For 2021, we note that, on average, actual values are slightly lower than expected. The average energy load for the actual values is 7554 MW (sd. 1251.8) and for the predicted values is 7750 MW (sd. 1270.3). Why? The economy adjusted to new conditions and now it’s using less resources as expected before?

Code
test2_2021_full = data2_2019_2022[data2_2019_2022['Year'] == 2021]

#Excluding date, price, load predicted columns
test2_2021_all = test2_2021_full.iloc[:, 4:]
#Only date column
test2_date_2021 = test2_2021_full.iloc[:, 0]

test2_2021 = test2_2021_all.copy()
test2_2021 = scaler2.transform(test2_2021)

pred2_2021 = model_hourly6.predict(test2_2021)
274/274 [==============================] - 1s 3ms/step
Code
# Reshape pred2_2019 to match the shape of test2_date_2019
pred2_2021 = np.reshape(pred2_2021, (-1,))

# Concatenate actual and predicted values into a single DataFrame
df = pd.DataFrame({'Date': test2_date_2021,
                   'Actual': test2_2021_full['load'],
                   'Predicted': pred2_2021})

# Plot the actual and predicted values
pred_plot_2021 = px.line(df, x='Date', y=['Actual', 'Predicted'], labels={'value': 'Load, MW'})

# Set plot title and axis labels
pred_plot_2021.update_layout(xaxis_title='Date', yaxis_title='Energy load in megawatts (MW)')

pred_plot_2021.update_xaxes(rangeslider_visible=True)

# Show the plot
pred_plot_2021.show()

4.2 The impact of the Russo-Ukrainian War on energy load profile in the Czech Republic

Code
test2_2022_full = data2_2019_2022[data2_2019_2022['Year'] == 2022]

#Excluding date, price, load predicted columns
test2_2022_all = test2_2022_full.iloc[:, 4:]
#Only date column
test2_date_2022 = test2_2022_full.iloc[:, 0]

test2_2022 = test2_2022_all.copy()
test2_2022 = scaler2.transform(test2_2022)

pred2_2022 = model_hourly6.predict(test2_2022)
274/274 [==============================] - 1s 2ms/step
Code
# Reshape pred2_2022 to match the shape of test2_date_2022
pred2_2022 = np.reshape(pred2_2022, (-1,))

# Concatenate actual and predicted values into a single DataFrame
df = pd.DataFrame({'Date': test2_date_2022,
                   'Actual': test2_2022_full['load'],
                   'Predicted': pred2_2022})

# Plot the actual and predicted values
pred_plot_2022 = px.line(df, x='Date', y=['Actual', 'Predicted'], labels={'value': 'Load, MW'})

# Set plot title and axis labels
pred_plot_2022.update_layout(xaxis_title='Date', yaxis_title='Energy load in megawatts (MW)')

pred_plot_2022.update_xaxes(rangeslider_visible=True)

# Show the plot
pred_plot_2022.show()

5 Discussion

References

Behm, Christian, Lars Nolting, and Aaron Praktiknjo. 2020. How to model European electricity load profiles using artificial neural networks.” Applied Energy 277: 115564.
Climate Change Service, C3S. 2023. ERA5: Fifth generation of ECMWF atmospheric reanalyses of the global climate [Data set], Copernicus Climate Change Service Climate Data Store.” https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels?tab=form. 2023.
ENTSO-E. 2022. Electricity market price and load data [Data set].” https://transparency.entsoe.eu/. 2022.
Hinton, Geoffrey Hinton. 2012. Neural Networks for Machine Learning: Lecture 6a Overview of mini-batch gradient descent.” http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf. 2012.
Róbert, Jaromír Ambriško, Ondřej Gec, Jan Michálek, and Šolc. 2020. Direct Impacts of the COVID-19 Pandemic on the Czech Economy.” https://www.cnb.cz/en/monetary-policy/inflation-reports/boxes-and-annexes-contained-in-inflation-reports/Direct-impacts-of-the-Covid-19-pandemic-on-the-Czech-economy . 2020.

6 Supplementary Material

6.1 ANN model specification

Code
#Try several epochs and batch_size specifications
#rule of thumb is to start with a value that is 3 times the number of columns in your data
model_hourly = build_model2()
model_hourly.fit(data2_train_X, data2_train_y,
             epochs = 27, batch_size = 16, verbose = 0)

model_hourly2 = build_model2()
model_hourly2.fit(data2_train_X, data2_train_y,
             epochs = 45, batch_size = 16, verbose = 0)

model_hourly3 = build_model2()
model_hourly3.fit(data2_train_X, data2_train_y,
             epochs = 90, batch_size = 16, verbose = 0)

model_hourly4 = build_model2()
model_hourly4.fit(data2_train_X, data2_train_y,
             epochs = 100, batch_size = 16, verbose = 0)

model_hourly5 = build_model2()
model_hourly5.fit(data2_train_X, data2_train_y,
             epochs = 120, batch_size = 16, verbose = 0)
Code
##Seems that around 100 epochs is the optimal size, let's see if there are changes on batch_size
#32 is a rule of thumb and a good initial choice, powers of 2
model_hourly6 = build_model2()
model_hourly6.fit(data2_train_X, data2_train_y,
             epochs = 120, batch_size = 32, verbose = 0)

model_hourly7 = build_model2()
model_hourly7.fit(data2_train_X, data2_train_y,
             epochs = 120, batch_size = 64, verbose = 0)

model_hourly8 = build_model2()
model_hourly8.fit(data2_train_X, data2_train_y,
             epochs = 120, batch_size = 128, verbose = 0)

model_hourly9 = build_model2()
model_hourly9.fit(data2_train_X, data2_train_y,
             epochs = 120, batch_size = 512, verbose = 0)
Code
scores21 = model_hourly.evaluate(data2_test_X, data2_test_y)
scores22 = model_hourly2.evaluate(data2_test_X, data2_test_y)
scores23 = model_hourly3.evaluate(data2_test_X, data2_test_y)
scores24 = model_hourly4.evaluate(data2_test_X, data2_test_y)
scores25 = model_hourly5.evaluate(data2_test_X, data2_test_y)
scores26 = model_hourly6.evaluate(data2_test_X, data2_test_y)
scores27 = model_hourly7.evaluate(data2_test_X, data2_test_y)
scores28 = model_hourly8.evaluate(data2_test_X, data2_test_y)
scores29 = model_hourly9.evaluate(data2_test_X, data2_test_y)
Code
# Let's create a table to compare different models.
# Define the table headers
table = PrettyTable()
table.field_names = ["Metric", "Model 1", "Model 2", "Model 3", "Model 4", 'Model 5']

# Add the evaluation scores to the table
table.add_row(["Loss", round(scores21[0], 2), round(scores22[0], 2), round(scores23[0], 2), round(scores25[0], 2) ,round(scores24[0], 2)])
table.add_row(["MAE", round(scores21[1], 2), round(scores22[1], 2), round(scores23[1], 2), round(scores25[1], 2),round(scores24[1], 2)])
table.add_row(["MSE", round(scores21[2], 2), round(scores22[2], 2), round(scores23[2], 2), round(scores25[2], 2), round(scores24[2], 2)])
table.add_row(["MAPE", round(scores21[3], 2), round(scores22[3], 2), round(scores23[3], 2), round(scores25[3], 2), round(scores24[3], 2)])
table.add_row(["Hourly data", 'YES', 'YES', 'YES', 'YES', 'YES'])
table.add_row(["epochs", '27','45','90','100','120'])
table.add_row(["batch_size", '16', '16','16','16', '16'])

# Print the table
print(table)
Code
# Let's create a table to compare different models.
# Define the table headers
table2 = PrettyTable()
table2.field_names = ["Metric", "Model 4", "Model 6", "Model 7", "Model 8", "Model 9" ]

# Add the evaluation scores to the table
table2.add_row(["Loss", round(scores25[0], 2), round(scores26[0], 2), round(scores27[0], 2), round(scores28[0], 2), round(scores29[0], 2)])
table2.add_row(["MAE", round(scores25[1], 2), round(scores26[1], 2), round(scores27[1], 2), round(scores28[1], 2), round(scores29[1], 2)])
table2.add_row(["MSE", round(scores25[2], 2), round(scores26[2], 2), round(scores27[2], 2), round(scores28[2], 2), round(scores29[2], 2)])
table2.add_row(["MAPE", round(scores25[3], 2), round(scores26[3], 2), round(scores27[3], 2), round(scores28[3], 2), round(scores29[3], 2)])
table2.add_row(["Hourly data", 'YES', 'YES', 'YES', 'YES', 'YES'])
table2.add_row(["epochs", '100', '100','100','100','100'])
table2.add_row(["batch_size", '16', '32', '64','128','512'])

# Print the table
print(table2)
Metric Model 1 Model 2 Model 3 Model 4 Model 5
Loss 171563.02 100688.93 94633.09 64456.93 88417.38
MAE 318.82 226.98 225.5 188.57 221.53
MSE 171563.02 100688.93 94633.09 64456.93 88417.38
MAPE 4.32 3.11 3.05 2.61 3.12
Hourly data YES YES YES YES YES
epochs 27 45 90 100 120
batch_size 16 16 16 16 16
Metric Model 4 Model 6 Model 7 Model 8 Model 9
Loss 64456.93 112494.2 133661.5 123432.58 281851.22
MAE 188.57 238.78 267.54 257.19 420.56
MSE 64456.93 112494.2 133661.5 123432.58 281851.22
MAPE 2.61 3.3 3.74 3.49 5.78
Hourly data YES YES YES YES YES
epochs 100 100 100 100 100
batch_size 16 32 64 128 512

6.2 Validation of the model using data from 2019

Code
test2_2019_full = data2_2019_2022[data2_2019_2022['Year'] == 2019]

#Excluding date, price, load predicted columns
test2_2019_all = test2_2019_full.iloc[:, 4:]

#Only date column
test2_date_2019 = test2_2019_full.iloc[:, 0]

test2_2019 = test2_2019_all.copy()

test2_2019 = scaler2.transform(test2_2019)
Code
model_hourly6.fit(data2_train_X, data2_train_y,
             epochs = 120, batch_size = 32, verbose = 0)

pred2_2019 = model_hourly6.predict(test2_2019)

Figure 2 shows the forecast of our model and the actual values for the energy load in megawatts (MW) for the Czech Republic for 2019. In general, we observe an accurate fitting in the validation data.

Code
# Reshape pred2_2019 to match the shape of test2_date_2019
pred2_2019 = np.reshape(pred2_2019, (-1,))

# Concatenate actual and predicted values into a single DataFrame
df = pd.DataFrame({'Date': test2_date_2019,
                   'Actual': test2_2019_full['load'],
                   'Predicted': pred2_2019})

# Plot the actual and predicted values
pred_plot_2019 = px.line(df, x='Date', y=['Actual', 'Predicted'], labels={'value': 'Load, MW'})

# Set plot title and axis labels
pred_plot_2019.update_layout(xaxis_title='Date', yaxis_title='Energy load in megawatts (MW)')

pred_plot_2019.update_xaxes(rangeslider_visible=True)

# Show the plot
pred_plot_2019.show()
Code
## The goal is to predict 2019-2024. Let's take subset 2015-2018 to train the model.
# select rows with date earlier than 12/12/2018
data = data_complete[data_complete['Date'] <= '2018-12-31']

# select rows with date later than 12/12/2018
data_2019_2022 = data_complete[data_complete['Date'] > '2018-12-31']
Code
#Add holidays data to the model
Code
#Database 1 (Daily)
# train test split
# data_X is a dataframe with all columns of 'data' droping 'Date' and 'load' columns.
# data_y is a series with only column 'load' of 'data'.
data_X, data_y = data.drop(["Date", "load"], axis = 1), data.load

#'train_test_split()' function is to split the predictor variables and the target variable into training and testing sets
#20% of the data will be used for testing
# 42 is the random seed

data_train_X, data_test_X, data_train_y, data_test_y = train_test_split(
    data_X, data_y, test_size = 0.2, random_state = 42)
Code
# normalize the data
scaler = MinMaxScaler() #define scaling method
scaler.fit(data_train_X) #intialize scaler
data_train_X = scaler.transform(data_train_X)
data_test_X = scaler.transform(data_test_X)
Code
# fit method is used to train the model on the training data by minimizing the loss function using the specified optimazer. Parameters:
#epochs is the number of times to iterate over the entire training dataset during training.
#batch_size is the number of samples to use in each gradient update.
#verbose =0 mean no output will be printed during training.
#model_daily = build_model()
#model_daily.fit(data_train_X, data_train_y,
#             epochs = 46, batch_size = 16, verbose = 0)
<keras.callbacks.History at 0x226d752fc70>
Code
<keras.callbacks.History at 0x226d528c700>
Code
#This line evaluates the performance of the trained NN model on the test data using evaluate method. It takes the test predictor variables (data_test_X) and the test target variable (data_test_y) as input arguments, and returns the values of the specified metrics. test_mse_score represents the mean squared error (MSE) between the model's predicted values and the actual target values on the test data. mae_score represents the mean absolute error (MAE) between the predicted and actual target values on the test data.
#scores = model_daily.evaluate(data_test_X, data_test_y)
#print(scores)
10/10 [==============================] - 0s 2ms/step - loss: 446686.7812 - mean_absolute_error: 519.9492 - mean_squared_error: 446686.7812 - mean_absolute_percentage_error: 7.2066
[446686.78125, 519.9491577148438, 446686.78125, 7.2065606117248535]
Code
# Check indicators to compare models. Absolute error, mean squared error. How to compare models.
# Check other specifications of the ANN. Epochs try several models.
# 1. What happens if we change data from hourly to daily in the quality of predictions?
# 2. Synthetical year for COVID and also Russian-Ukrainian War. Counterfactual.
# 3. What happens now in 2023?
# 4. Compare CEPS.cz data; we should talk to them - Silvester.

7 Test 2020

7.1 Daily

7.2 Hourly

Code
275/275 [==============================] - 1s 3ms/step

Code
## 1. Start to write down some results. Intro: RAP
## 2. Internal presentation.
## CESNET - create account. Remote running. e-infrastructure services
##

8 Comparison to entso-e forecast (2016-2022)

Code
"""
# data import
entso_data_2022 = pd.read_csv('Total_Load_Day_Ahead_ Actual_2022.csv')

entso_data_2022 = entso_data_2022.rename(columns={'Time (CET/CEST)': 'Date'})
entso_data_2022 = entso_data_2022.rename(columns={'Day-ahead Total Load Forecast [MW] - Czech Republic (CZ)': 'Forecast'})
entso_data_2022 = entso_data_2022.rename(columns={'Actual Total Load [MW] - Czech Republic (CZ)': 'Actual'})

print(entso_data_2022)
"""
"\n# data import\nentso_data_2022 = pd.read_csv('Total_Load_Day_Ahead_ Actual_2022.csv')\n\nentso_data_2022 = entso_data_2022.rename(columns={'Time (CET/CEST)': 'Date'})\nentso_data_2022 = entso_data_2022.rename(columns={'Day-ahead Total Load Forecast [MW] - Czech Republic (CZ)': 'Forecast'})\nentso_data_2022 = entso_data_2022.rename(columns={'Actual Total Load [MW] - Czech Republic (CZ)': 'Actual'})\n\nprint(entso_data_2022)\n"
Code
"""
# Extract the first date and hour and convert it to datetime data type
entso_data_2022['Date'] = pd.to_datetime(entso_data_2022['Date'].str.split(' - ').str[0], format='%d.%m.%Y %H:%M')

print(entso_data_2022)
"""
"\n# Extract the first date and hour and convert it to datetime data type\nentso_data_2022['Date'] = pd.to_datetime(entso_data_2022['Date'].str.split(' - ').str[0], format='%d.%m.%Y %H:%M')\n\nprint(entso_data_2022)\n"
Code
"""
# group by the date (without the time component) and calculate the mean value
entso_daily_data_2022 = entso_data_2022.groupby(entso_data_2022['Date'].dt.date)['Forecast'].mean().reset_index()

# convert the date column to datetime type
#entso_daily_data['Date'] = pd.to_datetime(entso_daily_data['Date'])

# set the date column as the index for each dataframe
#entso_daily_data.set_index('Date', inplace=True)

print(entso_daily_data_2022)
"""
"\n# group by the date (without the time component) and calculate the mean value\nentso_daily_data_2022 = entso_data_2022.groupby(entso_data_2022['Date'].dt.date)['Forecast'].mean().reset_index()\n\n# convert the date column to datetime type\n#entso_daily_data['Date'] = pd.to_datetime(entso_daily_data['Date'])\n\n# set the date column as the index for each dataframe\n#entso_daily_data.set_index('Date', inplace=True)\n\nprint(entso_daily_data_2022)\n"
Code
"""
combined_2022 = pd.merge(data_2022[['load','Date']].reset_index(), entso_daily_data_2022[['Forecast','Date']], how='inner', left_index=True, right_index=True)

combined_2022['pred'] = pred

print(combined_2022)
"""
"\ncombined_2022 = pd.merge(data_2022[['load','Date']].reset_index(), entso_daily_data_2022[['Forecast','Date']], how='inner', left_index=True, right_index=True)\n\ncombined_2022['pred'] = pred\n\nprint(combined_2022)\n"
Code
"""
# calculate the difference between actual and our predicted values
diff_1 = combined_2022['load'] - combined_2022['pred']
diff_2 = combined_2022['load'] - combined_2022['Forecast']

# plot predicted values
plt.plot(test_date, diff_2 , label='entso Predicted')

# plot the difference
plt.plot(test_date, diff_1, label='Our Prediction')

# set plot title and axis labels
plt.title('Difference between Actual and Predicted Load 2022')
plt.xlabel('Year - Month')
plt.ylabel('Load Difference, MW')

# show legend
plt.legend()

#save the plot
#plt.savefig('prediction_2022.png')

# display the plot
plt.show()
"""
"\n# calculate the difference between actual and our predicted values\ndiff_1 = combined_2022['load'] - combined_2022['pred']\ndiff_2 = combined_2022['load'] - combined_2022['Forecast']\n\n# plot predicted values\nplt.plot(test_date, diff_2 , label='entso Predicted')\n\n# plot the difference\nplt.plot(test_date, diff_1, label='Our Prediction')\n\n# set plot title and axis labels\nplt.title('Difference between Actual and Predicted Load 2022')\nplt.xlabel('Year - Month')\nplt.ylabel('Load Difference, MW')\n\n# show legend\nplt.legend()\n\n#save the plot\n#plt.savefig('prediction_2022.png')\n\n# display the plot\nplt.show()\n"